Joeyonng
  • Notebook
  • Pages
  • About
  • Backyard
  1. Probability and Statistics
  2. 25  Limit Theorems
  • Welcome
  • Notations and Facts
  • Linear Algebra
    • 1  Fields and Spaces
    • 2  Vectors and Matrices
    • 3  Span and Linear Independence
    • 4  Basis and Dimension
    • 5  Linear Map and Rank
    • 6  Inner Product and Norm
    • 7  Orthogonality and Unitary Matrix
    • 8  Complementary Subspaces and Projection
    • 9  Orthogonal Complement and Decomposition
    • 10  SVD and Pseudoinverse
    • 11  Orthogonal and Affine Projection
    • 12  Determinants and Eigensystems
    • 13  Similarity and Diagonalization
    • 14  Normal and Hermitian Matrices
    • 15  Positive Definite Matrices
  • Calculus
    • 16  Derivatives
    • 17  Chain rule
  • Probability and Statistics
    • 18  Probability
    • 19  Random Variables
    • 20  Expectation
    • 21  Common Distributions
    • 22  Moment Generating Function
    • 23  Concentration Inequalities I
    • 24  Convergence
    • 25  Limit Theorems
    • 26  Maximum Likelihood Estimation
    • 27  Bayesian Estimation
    • 28  Expectation-maximization
    • 29  Concentration Inequalities II
  • Learning Theory
    • 30  Statistical Learning
    • 31  Bayesian Classifier
    • 32  Effective Class Size
    • 33  Empirical Risk Minimization
    • 34  Uniform Convergence
    • 35  PAC Learning
    • 36  Rademacher Complexity
  • Machine Learning
    • 37  Linear Discriminant
    • 38  Perceptron
    • 39  Logistic Regression
    • 40  Multi-layer Perceptron
    • 41  Boosting
    • 42  Support Vector Machine
    • 43  Decision Tree
    • 44  Principle Component Analysis
  • Deep Learning
    • 45  Transformer

Table of contents

  • Sample mean
  • Law of large numbers (LLN)
  • Central limit theorems
  1. Probability and Statistics
  2. 25  Limit Theorems

25  Limit Theorems

Sample mean

Definition 25.1 (Sample mean) Let X_{1}, \dots , X_{n} be a sequence of i.i.d random variables with mean \mu and variance \sigma^{2}. Then the sample mean \bar{X}_{n} is defined as

\bar{X}_{n} = \frac{ 1 }{ n } \sum_{i = 1}^{n} X_{i}.

The sample mean is a random variable since it is a function of random variables. The expectation and variance of the sample mean can be calculated

\mathbb{E}_{\bar{X}_{n}} [\bar{X}_{n}] = \mathbb{E}_{X_{1}, \dots, X_{n}} \left[ \frac{ 1 }{ n } \sum_{i = 1}^{n} X_{i} \right] = \frac{ 1 }{ n } \mathbb{E}_{X_{i}} [X_{i}] = \frac{ 1 }{ n } n \mu = \mu,

\mathrm{Var} [\bar{X}_{n}] = \mathrm{Var} \left[ \frac{ 1 }{ n } \sum_{i = 1}^{n} X_{i} \right] = \frac{ 1 }{ n^{2} } \sum_{i = 1}^{n} \mathrm{Var} [X_{i}] = \frac{ 1 }{ n^{2} } n \sigma^{2} = \frac{ \sigma^{2} }{ n }.

Law of large numbers (LLN)

There are two versions laws of large numbers, both of which state that the the sample mean of n i.i.d random variables converges to their mean \mu, that is, as n get larger, the sample mean is getting closer to \mu.

Theorem 25.1 (Weak law of large number (WLLN)) Let \bar{X}_{n} be the sample mean of n i.i.d random variables X_{1}, \dots , X_{n} with mean \mu. Then \bar{X}_{n} converges in probability to \mu

\lim_{n \to \infty} \mathbb{P} (\lvert \bar{X}_{n} - \mu \rvert > \epsilon) = 0, \quad \epsilon > 0.

Proof

Let X_{1}, \dots, X_{n} be i.i.d random variables with finite mean \mu and finite variance \sigma^{2}. Then for any a > 0

\mathbb{P}_{X_{1}, \dots, X_{n}} \left( \left\lvert \frac{\sum_{i = 1}^{n} x_{i}}{n} - \mu \right\rvert \geq \epsilon \right) \leq \frac{\sigma^{2}}{n \epsilon^{2}}

Applying the Chebyshev’s inequality Theorem 23.2 over multiple random variables, we get the following for any t > 0,

\begin{aligned} \mathbb{P} \left( \left\lvert \sum_{i = 1}^{n} X_{i} - \sum_{i = 1}^{n} \mu_{i} \right\rvert \geq t \right) & \leq \frac{ \sum_{i = 1}^{n} \sigma_{i}^{2} }{ t^{2} } \\ \mathbb{P} \left( \left\lvert \sum_{i = 1}^{n} X_{i} - n \mu \right\rvert \geq t \right) & \leq \frac{ n \sigma^{2} }{ t^{2} }. \\ \end{aligned}

Setting t = n \epsilon,

\begin{aligned} \mathbb{P} \left( \left\lvert \sum_{i = 1}^{n} X_{i} - n \mu \right\rvert \geq n \epsilon \right) & \leq \frac{n \sigma^{2}}{n^{2} \epsilon^{2}} \\ \mathbb{P} \left( \left\lvert \frac{ \sum_{i = 1}^{n} X_{i} }{ n } - \mu \right\rvert \geq \epsilon \right) & \leq \frac{\sigma^{2}}{n \epsilon^{2}}. \\ \mathbb{P} \left( \lvert \bar{X}_{n} - \mu \rvert \geq \epsilon \right) & \leq \frac{\sigma^{2}}{n \epsilon^{2}}. \end{aligned}

We can get WLLN by taking the limit n \to \infty

\lim_{n \to \infty} \mathbb{P} \left( \lvert \bar{X}_{n} - \mu \rvert \geq \epsilon \right) = 0.

Theorem 25.2 (Strong law of large number (SLLN)) Let \bar{X}_{n} be the sample mean of n i.i.d random variables X_{1}, \dots , X_{n} with mean \mu. Then \bar{X}_{n} converges almost surely to \mu

\mathbb{P} (\lim_{n \to \infty} \bar{X}_{n} = \mu) = 1.

Proof

TODO

WLLN is form of convergence in probability, while SLLN is form of almost sure convergence. Therefore, SLLN is a stronger version than the WLLN.

Central limit theorems

Theorem 25.3 (Central limit theorem (CLT)) Let \bar{X}_{n} be the sample mean of n i.i.d random variables X_{1}, \dots , X_{n} with mean \mu and variance \sigma^{2}. If n goes to infinite, then \bar{X}_{n} follows a Gaussian distribution with mean \mu and \frac{ \sigma^{2} }{ n },

\bar{X}_{n} \sim \mathcal{N} \left( \mu, \frac{ \sigma^{2} }{ n } \right).

Proof

TODO

Although CLT is a form of convergence in distribution, which is known to be a weaker version of convergence than convergence in probability and almost sure convergence, it doesn’t mean that CLT is a weaker version of SLLN or WLLN.

24  Convergence
26  Maximum Likelihood Estimation